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ABSTRACT 

Fbr  a  general  link  linear  model  (GLLM) ,  we  show  that  the  OLS  estimate  o£ 
the  slope  vector  Is  strongly  consistent  up  to  a  multiplicative  scale,  even 
though  the  model  might  actually  be  nonlinear.  Furthermore,  the  estimated 
slope  vector  is  strongly  consistent  for  the  average  slope  vector,  the  average 
of  the  pointwise  slope  vectors  on  the  response  surface.  For  a  GLLM  with  a 
completely  specified  link  function,  we  can  solve  for  the  multiplicative  scalar 
and  estimate  the  true  slope  vector,  and  estimate  the  intercept  and  Cox  and 
Snell's  generalized  residuals.  We  then  estimate  the  response  surface  and  the 
pointwise  slopes  using  a  generalization  of  the  smearing  estimate  in  Duan 
(1983).  The  results  can  be  applied  to  a  number  of  important  subclasses  of 
GLLM,  including  general  transformation  models,  general  scaled  transformation 
models,  generalized  linear  models,  dichotomous  regression,  and  Tobit 
regression. 
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SIGNIFICANCE  AND  EXPLANATION 


We  justify  the  OLS  estimation  for  regression  models  on  a  fundjunental 
level:  the  method  gives  meaningful  results  even  idien  the  model  is  grossly 
misspecified.  The  OLS  estimates  are  easy  to  implement,  and  can  serve  as 
initial  estimates  for  Iterative  algorithm  to  derive  efficient  estimates* 


THE  ORDINARY  LEAST  SQUARES  ESTIMATION 
FOR  THE  GENERAL-LINK  LINEAR  MODELS,  WITH  APPLICATIONS 
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1.  IMTROOOCTIOH 


1 . 1  GaMral-Llnk  Linear  Model  (GUM) 

He  consider  a  general  class  of  regression  models  which  relate  the  dependent  variable 

y  to  the  regression  (independent)  variables  x,  consider  as  a  row  vector: 

Vi  "  g(60  +  x^B,  i  “  1,...,n  .  (1.1) 

He  will  call  those  models  the  general-linJc  linear  models  (GLLM) ,  the  function  g  the 

( 

general  lin)c  function  (GLF),  the  parameter  vector  B  the  slope  vector,  and  the  linear 
combination  Xj|^B  the  linear  component. 

Depending  on  the  specific  application,  the  GLF  g  might  be  completely  \inspecified, 
partially  specified,  or  completely  specified.  Note  that  if  the  GLF  is  completely 
unspecified,  the  Intercept  Bq  is  unidentified,  the  slope  vector  B  is  identified  only  up 
to  a  multiplicative  scale: 

Observation  1 .  For  any  location  and  scale  adjustments  on  Bq  x^B,  we  can  always  find  a 
GLF  which  satisfy  the  following: 

g[a  +  b(BQ  +  Xj^B),  e^l  ■  9*(Bq  +  Xj^B,  e^^)  (1.2) 


for  any  a  and  b.  I 
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When  the  6LF  Is  partially  specified,  observation  1  may  or  may  not  hold.  We  will  call 
a  subclass  of  GLLM’s  Identified  (unidentified)  In  location  and  scale  depending  on  whether 
observation  1  la  violated  or  satisfied  for  the  subclass. 

nte  GLLM  Includes  many  Important  classes  of  regression  models  as  special  cases.  Some 
Important  exan^les  are  given  below.  (For  all  the  examples,  we  assume  that  the  error  terms 
are  Identically  and  Independently  distributed  according  to  F(e).) 

Example  1.1.  General  transformation  models.  Assume  the  GLF  has  the  following  form: 

yjL  *  9(60  *1®  ®i)  *  (1.3) 

We  will  call  this  calss  of  GLLM's  the  general  transformation  models.  If  the  GU'  Is 
Invertible,  we  can  transform  y  into  a  linear  model: 

g“’(yi)  “  Bq  +  , 

which  Is  the  usual  specification  for  transformation  models.  However,  we  do  not  require  the 
GLF  g  to  be  invertible. 

Example  1.2.  Dichotomous  regression  models.  If  the  GLF  in  the  general  transformation 
model  (1.2)  Is  dichotomous, 

g(t)  «  1  if  t  >  0  , 

g(t)  “  0  if  t  <  0  , 

we  have  a  dichotomous  regression  model.  If  we  further  assume  that  the  error  distribution 
F(e)  Is  logistic  (normal),  we  have  the  logistic  (problt)  regression  model.  Note  that  the 
dichotomous  GLF  Is  not  invertible. 

Example  1.3.  Tobit  regression  model.  Assume  in  (1.3)  that  the  GLF  Is  the  censoring 
function: 

g(t)  -  t  If  t  >  0  , 

g(t)  -  0  if  t  <  0  , 

i.e.,  Vi  =  max(6o  +  x^S  +  0)  ,  (1.4) 

we  have  the  Tobit  regression  model.  (See,  e.g.,  Maddala,  1983,  p.  151.)  Note  that  the 
censoring  GLF  (1.4)  Is  not  invertible. 
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Example  1.4.  Additive-error  models.  Assume  that  the  GLF  la  additive,  we  have 


yi  “  9(^0  *i®'  ®i  • 

Ex^ullple  1.5.  Generalized  linear  models.  Assume  the  following  form  for  the  GLF: 

-  v(60  +  x^e)  +  a(Bo  +  Xj^6)*€i  .  (1.6) 

This  is  analogous  to  the  generalized  linear  model  with  link  function  v  and  variance 
function  o^.  Note  that  our  use  of  the  term  link  function  is  different  from  HcCullagh  and 
Nelder,  1983.  The  link  function  n  ■■  n(|i)  in  HcCullagh  and  Nelder  is  the  inverse  of  v. 

Example  1.6.  General  scaled  transformation  models.  Efron  (1983)  considers  a  rich  class  of 
transformation  models : 

Yi  “  g(v(nj^)  +  o(nj^)*ej^)  . 

If  we  assume  a  linear  model  for  n,  we  have 

-  q(v(B(j  +  xj^B)  +  o(Bo  +  xj^B)*ej^)  t  (1.7) 

we  will  follow  Efron’s  terminology  and  refer  to  this  class  of  models  as  the  general  scaled 
transformation  models.  This  class  includes  both  the  general  transformation  model  (1.3)  emd 
the  generalized  linear  models  (1.6)  as  special  cases. 

Example  1.7.  A  combined  error  model.  Assume  that  the  latent  variable  n  has  a  linear 
model  on  the  logarithmic  scale  with  behavioral  error  e  y : 

-  exp(Bo  +  xj^B  +  e^^)  , 

while  the  observed  dependent  variable  measures  the  latent  variable  with  measurement  error 
ej: 

yj^  -  exp(Bo  ’‘i’’  ^1i>  ^21  * 

This  model  belongs  to  the  GLLM,  but  does  not  belong  to  any  of  the  subclasses  discussed 
above.  (We  do  not  require  the  error  term  e  to  be  one-dimensional.) 

Example  1.8.  One-parameter  feunilies.  Consider  any  one-parameter  family  of  probability 
dlstrlhut ions: 
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Vihi  ~  F(yi>ei) 


If  m  assume  a  linear  model  6j^  -  Bg  x^6>  we  have  the  following  GLUl; 


(CiJ  Bn  +  XiB)  , 


where  ~  U(0,1)/  F~'  is  the  Inverse  of  F(«j6)  for  a  fixed  6.  For  discontinuous 
distributions,  we  define  F”^(e)  {y*  :  e  <  F{y’)}. 


1.2  OII£  Bsttmatlon  in  GUM 


We  have  a  general  result  on  the  strong  consistency  of  the  OLS  estimate  for  the  slope 


vector  B  in  GLLN. 


Theorem  1  Consider  the  OLS  slope  coefficients 


b  -  (x'QX)"’x'Qy 


for  the  GLIM  (1.1),  where  Q«I-1(1*1)“^1*.  We  have 


b  +  yB  (a.s.)  , 


y  »  Cov(xB,y)/Var(xB)  , 


(1.10) 


under  the  following  assumptions. 


(la)  The  GLF  g  is  measurable. 


( 1b)  The  regressor  variables  x^  and  error  terms  are  identically  and 


Independently  distributed  according  to  the  c.d.f.  M(x)F(e).  (Thus  x  and  e 


are  stochastically  Independent.) 


(1c)  The  moments  E(x)  -  u,  Cov(x)  ~  Z,  and  Cov(x,y)  exist;  Z  Is  nonsingular. 


(Id)  The  distribution  H(x)  of  the  regressor  variables  x  is  spherically  symmetric 


centered  at  u  with  respect  to  the  inner  product  <v,w>  *=  v'Zw:  for  any 


matrix  A  such  that  A'lA  »  £,  the  rotated  regressor  variables 


p  *  (x-u)A  has  the  same  distribution  M(x)  as  the  original  regressor 


variables  x. 


The  spherical  symmetry  condition  (Id)  can  be  replaced  by  the  following  polynomial 


normality  conditions. 


(le)  'Hie  GLF  g  Is  a  polyiuMtial  in  xB  of  degree  k< 

(lf)  The  regressor  variables  can  be  orthonormalized  to  be  stochastically  Independent; 
i.e.,  there  exist  a  square  luitrix  A  such  that  AEA'  •  I,  and  the 
orthonormalized  regressor  variables  {(x-u)A^}  are  mutually  independent* 

(lg)  The  first  k  +  1  cumulants  of  all  the  othonormalized  regressor  variables  are 
identical  to  those  of  the  standard  normal  variate.  II 

(Proof)  By  the  strong  law  of  large  numbers  and  conditions  (1b),  (1c),  we  have 

b  ♦  •  (a.s.)  , 

xy 

where  ~  Cov(x,y).  The  rest  of  the  proof  is  given  in  section  two.  I 

Hemsrk  1. 1 .  Ihe  result  in  theorem  1  does  not  apply  to  the  Intercept  Bq;  it  also  leaves 
the  multiplicative  scalar  y  to  be  determined.  For  completely  unspecified  GLUt’s,  this  is 
the  best  that  can  be  achieved  -  according  to  observation  1,  the  intercept  Bq  and  the 
multiplicative  scalar  y  are  not  identified  and  can  be  absorbed  into  the  GLF  g.  Ihe  same 
comment  applies  to  partially  specified  GliLM's  which  are  unidentified  in  location  and  scale, 
l.e.,  for  which  observation  1  is  valid. 

For  subclasses  of  6LLH  unidentified  in  location  and  scale,  we  will  consider  the 
estimation  of  the  response  surface  E(y|x)  in  section  1.4.  Note  that  while  the 
intercept  Sg  and  the  multiplicative  scalar  y  are  not  identified,  the  response  surface 
is  invariant  to  location  and  scale  adjustments  in  the  linear  component  xB . 

For  a  subclass  of  GLLM  identified  in  location  and  scale,  it  is  usually  possible  to 
estimate  the  intercept  Bg  and  the  multiplicative  scalar  y.  We  give  the  results  in 
section  3  for  GLUI's  with  completely  specified  GLF's. 

Bsmsrk  1.2.  It  la  well  known  that  the  OLS  estimate  ror  linear  models  is  robust  to 
perturbations  in  the  error  distribution  in  the  sense  that  under  assumption  (1b)  emd  (1c), 
the  OLS  estimate  Is  strongly  consistent.  (Note  that  for  a  linear  model  assumption  (1c) 
implies  that  E(e)  exists;  since  we  are  not  concerned  with  estimating  the  intercept  in 


theorem  1,  we  do  not  require  E(e)  =0.)  ‘Rieorem  1  indicates  that  the  OLS  estimate  for 
GLLK  is  robust  both  to  perturbations  in  the  GLF  as  well  '.a  to  perturbations  in  the  error 
distribution  in  the  sense  that  asymptotically  the  01.S  gives  the  correct  answer  in  the  sense 
of  (1.9). 

Bemark  1.3.  Beyond  strong  consistency,  we  usually  like  to  find  efficient  or  nearly 
efficient  estimates  if  the  model  is  sufficiently  specified.  The  OLS  estimate,  despite  its 
strong  consistency,  is  unlikely  to  be  efficient  except  in  some  special  cases.  In  a 
sufficiently  specified  model,  we  might  be  able  to  find  more  efficient  estimators,  e.g., 
using  adaptive  methods.  In  those  situations,  it  is  usually  desirable  to  use  a  consistent 
estimate  as  the  starting  value  for  numerical  algorithms  aimed  at  finding  efficient 
estimates.  Therefore  theorem  1  indicates  that  the  OLS  estimates  (up  to  a  multiplicative 
scalar)  can  be  used  as  the  starting  value.  In  a  completely  specified  model,  it  is  usually 
possslble  to  determine  or  estimate  the  multiplicative  scalar.  (See  section  3.) 

Researchers  at  the  Rand  Corporation  have  been  using  the  OLS  estimates  to  derive 
starting  values  for  logistic  regression  models  since  1974  and  for  probit  regression  models 
since  1980.  Generally  speaking  the  results  are  very  satisfactory.  For  both  models,  a 
stringent  convergence  criterion  that  the  conditional  log-likelihood  move  by  no  more  than 
0.01  is  satified  within  two  or  three  iterations. 

Remark  1.4.  If  the  multiplicative  scalar  y  is  zero,  there  is  no  information  in  y  for 
S  on  the  linear  scale;  we  might  still  be  able  to  estimate  B  by  transforming  y  to  an 
alternative  scale  on  which  y  is  nonzero.  Throughout  this  paper  will  assume  y  is 
nonzero.  See  remark  1.8  for  a  testing  procedure  for  this  hypothesis  for  the  general 
transformation  model. 

Remark  1.5.  As  a  special  case  of  the  result  based  on  the  polynomial  normality  conditions, 
consider  a  GLF  which  is  quadratic  in  x^B.  The  strong  consistency  result  (1.8)  holds  if 
all  orthonormalized  regressor  variables  are  symmetric.  This  will  be  true,  e.g.,  in  a 


factorial  experiment 


Hemark  1.6.  Theorem  1  requires  fairly  strong  assumptions  on  the  regressor  variables, 
namely,  they  have  to  be  spherically  symmetric  (Id)  or  polynomially  normal  { 1e  -  1g) .  This 
should  be  considered  as  a  desirable  property  for  experimental  designs,  namely,  designs 
which  satisfy  either  condition  (Id)  or  (1e  -  1g)  are  robust  in  the  sense  of  remark  1>2. 

For  example,  spherically  symmetric  designs  enjoy  the  robustness  property  under  theorem  1. 

As  another  example,  we  might  known  from  prior  experience  or  theory  that  the  linear  model  is 
applicable  on  the  cube  root  scale  but  then  have  to  retransform  back  to  the  original 
scale,  A  design  in  which  all  factors  are  symmetric  amd  also  has  zero  kurtosis  is  the 
robust  in  the  sense  of  remark  1.2. 

Hemark  1.7.  Conditions  (Id)  or  ( 1e  -  1g)  rule  out  the  possibility  of  having  interaction 
terms  in  the  linear  components  x6 .  In  other  words  we  need  to  assume  that  under  the 
appropriate  link  function,  the  effects  of  the  regressor  variables  are  additive.  For 
example,  for  the  general  transformation  model,  this  amounts  to  the  assumption  that  afte'-  an 
appropriate  transformation  we  have  an  additive  model.  See  Schefffe,  1959,  pp.  95-98,  for  a 
characterization  of  response  surfaces  which  can  be  linearized  into  an  additive  model. 

1 . 3  Response  Surface  and  Unearlsed  Response  Surface. 

In  many  situations  our  ultimate  goal  is  to  estimate  the  response  surface 

v(x)  v(x6)  “  E(y|x)  =  /  gCSp  +  xB,e)dF(e)  .  (1.11) 

Note  that  the  response  surface  depends  on  the  regressor  variables  x  only  through  x6.  If 
we  know  the  true  parameters  6,  we  can  estimate  the  response  surface  using  a  nonparametric 
regression  of  yj^  on  xj^B.  For  the  completely  unspecified  GLUI  or  a  subclas  which  is 
unidentified  in  location  and  scale,  the  indeterminancy  of  the  intercept  Bq  and  the 
multiplicative  scalar  y  is  irrelevant.  For  any  two  GLF's  g  and  g*  which  satisfy 
(1.2),  the  response  surfaces  are  the  same. 


The  deviations  from  the  response  surface,  -  v(Xj^),  is  not  usually 
homoscedastlc.  If  «re  know  the  form  of  the  variance  function  a^(xg}  •  Vartyjx),  e.g.,  if 
we  have  an  additive-error  model  or  a  generalized  linear  model,  we  could  use  weighted 
methods  in  the  nonparametric  regression.  The  variance  function  is  also  a  function  of  the 
linear  component  x^^B*  When  the  form  of  the  variance  function  is  known,  we  can  use 
Iterated  weighted  nonparametric  regression,  estimating  the  variance  function  from  the 
deviations  from  the  estimated  response  surface. 

In  reality  we  don't  know  the  true  parameters  B,  and  have  to  use  an  estimate  b  for 
it,  such  as  the  OLS  estimate  (1.8).  The  estimate  for  the  response  surface  has  to  be  based 
on  the  nonparametric  regression  of  y  on  xb.  Therefore  it  is  necessary  to  consider  the 
error  in  the  regressor  variable  xb.  We  don't  know  of  euny  work  in  the  literature  on  the 
consideration  of  nonparametric  regression  in  the  presence  of  error-in-variable. 

For  a  general  transformation  model  with  a  completely  specified  (^.F,  if  the  GLF  g  is 
invertible,  and  the  OLS  estimation  is  applied  on  the  linear  model  scale  g~^(y),  Duan 
(1983)  proposed  the  smearing  estimate  as  an  estimate  for  the  response  surface: 


s(x)  =  n~’  g(6p  +  xB  +  e^^)  , 

where  6g,  6,  and  are  OLS  estimates  based  on  the  regression  of  g“'(y)  on  x.  Duan 

( 1983)  showed  that  the  smearing  estimate  is  weakly  consistent  for  the  response  surface 
v(x)  under  some  regularity  conditions. 

For  a  GLIM  with  a  completely  specified  GLF,  we  can  use  the  following  generalization  of 
the  smearing  estimate  to  estimate  the  response  surface: 

s(x)  =•  n”'  g(bQ  +  c“’*xb,  e^^)  ,  (1.12) 

where  b  is  the  OLS  estimate  (1.8):  bQ,  c,  and  are  estimates  for  Sq,  Y,  and 

In  section  4  we  give  a  consistency  result  for  the  smearing  estimate  (1.12). 

The  construction  of  the  estimated  response  surface,  either  using  nonparametric 
regression  or  the  smearing  estimate,  can  be  computationally  expensive.  As  an  alternative, 
we  may  consider  the  linearized  response  surface 


p(x)  =  pQ  +  (x-u)9 


(1.13) 


r-l 


where  p  =  E(y),  0  =  E  Z  Z  =  COv(x,y).  The  linearized  response  surface  minimizes 
u  xy  xy 
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.'rl-' 


over  all  linear  surfaces 


2 

the  mean  squared  prediction  error  E(y  -  -  xa) 

Oq  +  xa;  it  also  minimizes  the  mean  squared  approximation  error  E(v(x)  -  a 

proof  in  section  1.2  for  theorem  1  shows  that  the  OLS  estimate  b  in  (1.8) 

consistent  for  9  =  S  *  under  assumptions  (lb)  and  (1c).  Furthermore/ 

xy 

prediction 


0  ■ 

is  strongly 
the  OLS 


The 


r(x)  =  y  +  (x  -  x)b 

is  strongly  consistent  for  the  linearized  response  surface.  What  remains  to  be  shown  for 
theorem  1  is  that  the  slope  6  in  the  linearized  response  surface  (1.13)  is  related  to  the 
parameter  S  in  GLLM  as  follows : 

0  =  •  y*6  ,  (1- 14) 

xy 

where  y  is  given  in  (1.10).  We  will  prove  this  result  in  section  2. 


Corollary  1.  Under  the  assumptions  in  theorem  1,  the  linearized  response  surface  has  the 
following  expression: 

p(x)  =  Pq  +  Y*(x-u)B  , 

where  y  is  given  in  (1.9).  In  particular/  the  linearized  response  surface  depends  on  the 
regressor  variables  x  only  through  xB.  J 


1 . 4  Rjlntwlse  Slopes  and  leverage  Sieves 

In  many  situations  we  are  not  interested  in  the  response  surface  v(x)  per  se. 

Instead  we  are  interested  in  the  polntwlse  slopes/  assuming  that  they  exist: 

V^v(x)  =  B'v' (xB)  .  (1.15) 

For  a  fixed  design  point  x,  the  pointwise  slope  V^v(x)  is  the  change  in  the  mean 
response  when  the  levels  of  the  regressor  variables  change.  Note  that  in  GLLM  the 
pointwise  slopes  are  always  proportional  to  B/  therefore  the  OLS  b  in  (1.10)  is 
strongly  consistent  for  all  pointwise  slopes  up  to  the  multiplicative  scale  y/v'(xB). 

The  multiplicative  scalar  v'(xB)  in  the  pointwise  slopes  (1.15)  can  be  estimated  by 
differentiating  the  estimated  response  surface/  either  the  nonparametric  regression 
estimate  for  v,  or  the  smearing  estimate  s  in  (1.12).  We  will  discuss  the  latter 
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method  In  Section  4> 


Ihe  multiplicative  scalar  v'(x6)  is  given  by 

V' (xB)  -  /  g(6j,  +  xe,e)dF(e)  .  (1.16) 

Assuming  that  g  is  differentiable,  and  that  the  differentiation  and  integration  can  be 
interchanged,  we  have 

v'(xB)  -  f  g^CBg  +  x6,e)dF(e)  -  E[g,|x]  ,  (1.17) 

where  is  the  first  partial  derivative  of  g. 

The  same  comment  in  section  1 . 3  regarding  the  cost  of  estimating  the  response 
surface  v(x)  can  also  t>e  applied  to  the  estimation  of  the  pointwlse  slopes  B*v'(xB).  As 
an  alternative,  we  can  instead  estimate  the  average  slopes 

E^V^v(x)  «  B*E^v*(xB)  .  (1.18) 

If  the  expression  (1.17)  is  valid,  the  average  slopes  are  given  by 

E^V^v(x)  -  8*E  gi(Bo  +  xS,e)  .  (1.19) 

Expression  (1.19)  is  usually  easier  to  evaluate  thim  expression  (1.18)>  we  don't  need 
to  evaluate  the  conditional  expectation  v(x)  »  E(y|x).  However,  there  are  important 
models  such  as  logistic  regression  for  which  the  differentiation  and  Integration  in  (1.16) 
cannot  be  interchanged. 

Stein  (1981)  gave  the  following  lemma  which  was  a  crucial  tool  for  evaluating  HSE  for 
the  estimation  of  the  normal  mean. 

2 

Stein's  Teams.  Let  u  be  a  N(u,a  )  real  random  variables  and  let  the  real-valued 
function  g(u)  be  the  indefinite  integral  of  the  Lebesque  measurelble  function  g'(u). 
Assume  that  E|g'<u)|  <  Then 

E[g'(u)]  =  Cov(u,g(u) )/Var(u)  .  I 

vje  will  follow  Stein's  terminology  and  refer  to  g  as  an  almost  differentiable 
function,  and  refer  to  g'  as  the  derivative  of  g.  Note  that  the  almost  differentiable 
condition  is  stronger  than  being  differentiable  almost  everywhere. 
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Thaora«  2.  The  multiplicative  scalar  (1.10)  In  theorem  1  is  identical  to  the 


multiplicative  scalar  in  the  average  slopes  (1.18), 

Cov(x6,y)/Var(xB>  -  E^^vMxB)  ,  (1.20) 

under  assumptions  (1a  -  1c)  in  theorem  1  and  the  following  assumptions. 

(2a)  The  response  surface  v(xB)  is  almost  differentiable  in  xB. 

(2b)  xB  is  normally  distributed. 

The  normality  assumption  (2a)  can  be  replaced  by  the  following  polynomial  normality 
assumptions  analogous  to  ( 1e  -  1g>. 

(2c)  The  response  surface  v(xB)  is  polynomial  in  xB  of  degree  k. 

(2d)  The  first  k  t  1  cumulants  of  xB  are  identical  to  those  of  the  standard 

normal  variate. 

The  expression  (1.19)  can  be  used  in  (1.20): 

Cov(xB,y)/Var( xB )  “  E  g^lBo  »  (1.21) 

under  the  assumptions  (la)  -  (1c)  in  theorem  1,  assumption  (2b),  and  the  following 
additional  assumptions. 

(2e)  For  all  e,  the  GLF  g(n,e)  is  almost  differentiable  with  respect  to  n* 

(2f)  The  differentiation  and  integration  can  be  interchanges  in  (1.16).  Foe  example, 

this  is  satisfied  if  |g^I  is  dominated  uniformly  by  an  integrable  function  Q: 
|gi(h,e)|  <  C(e),  E  B(e)  <  "  • 

Assumption  (2b)  can  be  replaced  by  the  following  polynomial  normality  conditions. 


(2g) 

The 

GLF  g  is  a  polynomial 

in 

xS 

of  degree  k . 

(2h) 

The 

first  k  +  1  cumulants 

of 

x8 

are  identical  to  those  of  the  standard 

normal  variable.  II 


(Proof)  Applying  Stein's  Lemma  under  the  assumption  that  xB  is  normally  distributed,  we 
have 

E^v'(x8)  «  Cov(xB,  v(x6))/Var(x6) 

Note  that 
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Cov(xBfy)  -  BC(x-p)6*g(8Q  +  x6,e)) 

»  B  E*(x“y)B*g(B(j  +  xBfe) 

“  e(x-ii>B*v(xB ) 

-  Cov(xB#v(xB) )  . 

Note  that  we  adopt  Stein 'e  notation  tht  a  superscript  for  the  operators  E,  Var,  and  Cov 
Indicates  applying  the  operator  conditioned  on  the  superscript* 

nie  result  under  the  polynomial  normality  conditions  (2c)  and  (2d)  are  proved  In  the 
same  way  as  the  proof  for  theorem  1  under  the  polynomial  normality  assumptions  ( 1e  -  1g), 
to  be  given  In  section  2.  I 

Hsmark  1.8.  For  a  general  transformation  model  with  a  completely  specified  Invertible 
GLF  9,  If  expression  (1.21)  Is  valid,  we  can  estimate  the  multiplicative  scalar  y  • 
EgMg“'(y))  by  the  sample  average  Aj^  •  n”'l^g*(g"^(yj^) ),  which  converges  almost  surely 
to  y  by  the  strong  law  of  large  numbers.  If  the  second  moment  for  g*  exists, 

=  n'''2  (A^-Y)/s^  converges  In  law  to  the  standard  normal  distribution,  where  s„  is  the 
sample  standard  deviation.  We  can  therefore  construct  confidence  Intervals  for  yt  in 
particular,  we  can  test  the  null  hypothesis  y  »  0. 

Corollary  2.  Under  the  assumptions  In  theorems  1  and  2,  the  OLS  estimate  b  In  (1.8)  has 
the  following  almost  sure  limit: 

b  t*B  (a.s.)  , 

where  Y  -  E^v’(xB) 

or  y  »  E  g^(xB,e)  .  I 

Corollary  3.  Under  the  assumptions  In  theorems  1  and  2,  the  linearized  response  surface 
has  the  alternative  expression 

p(x)  -  Oq  *  Y*(x  -  m)B  , 


where  y  Is  given  as  In  corollary  2 


2.  PROOr  AMD  DISCOSSIOM  OT  TBKMIBl  1 


In  this  section  we  given  the  proof  for  theorem  1,  and  give  several  useful 
generalizations • 

Proof  of  theorem  1 

We  need  to  prove  that 

V  -  B(x-y)y  •  • 

Let  6  be  em  p  dimensional  colusui  vector  such  that  *  0,  6'£6  1>  We  will  show 

that  v6  -  0.  Note  that 

ve  -  E(x-w)0*g(6Q  +  x6,  e) 

-  E  E®'’‘®(x-M)e*g(8Q  +  x6,  e) 

-  E(g{B(j  +  x8,  e)*E^'’‘®(x-u)e] 

-  ECgCBj,  +  x8,  e).E’‘®(x-y>e]  . 

Since  Cov(x8,x6)  >  8'|6  >  0,  x8  and  x8  are  uncorrelated.  If  x  has  a  multinormal 
distribution,  It  follows  that  x8  and  x6  are  stochastically  Independent,  therefore 

E*®(x-v)0  -  0  ,  (2.1) 

thus  v6  «■  0.  We  prove  In  Appendix  A  that  the  same  condition  holds  If  x  is  spherically 
symmetric. 

Having  proved  that  v6  -  0  for  all  6  such  that  8'S9  -  0,  it  follows  that  v  must 
fall  along  the  direction  8'!),  i.e.,  v  -  a«8'I  for  some  scalar  a.  We  now  determine  the 

scalar  a.  On  the  one  hand, 

v8  ■  a»6'E8  “  a»Var(x8) 

On  the  other  hand, 

v8  “  E(x-y)8»y  “  Cov(x8,  y> 

Therefore  a  -  Cov(x8,  y)/Var(x8)  as  given  In  (1.9)  In  theorem  1. 

We  now  prove  the  theorem  under  the  polynomial  normality  conditions  (le  -  Ig) .  Let 
5  ”  (x-u)A  be  the  orthonormallzed  regressor  variables  given  in  (If).  Note  that 
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(2.2) 


Bg  +  xB  -  8*  +  , 

vB  «  E(x-y)8»y 

-  E  CT*g(BJ  +  59,  e)  , 


where  BJ  •  Bq  ♦  uB,  9  ■  lA'B,  t  «  EA*0. 

By  assumption  (1e),  9  Is  a  polynomial  In  59  of  degree  k;  it  follows  that  the 
Integrand  in  (2.2)  is  a  multivariate  polynomial  in  (5i,*>.,5p)  of  degree  k: 

ve  -  E{ij(.j  nP.,(5p)l‘P‘)}  .  (2.3) 

where  the  summation  is  taken  over  all  index  sets  J  •  < j( 1 ) , . . . , j (p) >  such  that 
^pj(P)  ^  3^6  coefficients  aj  depend  on  £,  B(,  9,  and  t. 

Since  the  orthonormallzed  regressor  variables  5  are  assumed  to  be  mutually 
independent,  the  expectation  in  (2.3)  can  be  taken  term  by  term: 

vB  -  i:,la,  nf  ,  m  ,  (2.4) 

J  J  p,3(p) 

where  ™  ,  is  the  j(o)-th  moment  for  5  . 

0  tj \P  }  0 

By  assumption  (1g),  the  moments  m  are  identical  to  those  of  a  corresponding  standard 
normal  variate.  Therefore  v0  has  the  same  value  if  we  replace  5  by  the  corresponding 
multinormal  random  vector  5*  ~  N(0,I).  Since  the  multinormal  random  vector  is  spherically 
symmetric,  by  the  first  part  of  theorem  1,  we  have  v6  =  0.  I 


PriTiirk  2.1.  For  logistic  regression,  a  special  case  of  example  1.2,  the  result  in  theorem 
1  for  normally  distributed  regressor  variables  has  been  known  since  Fisher  (1936).  See 
Haggstrom  (1983)  for  a  comprehensive  discussion  of  the  OLS  estimation  for  the  logistic 
regression  model.  As  was  noted  in  section  1.2,  Rand  researchers  have  been  using  the  01,8 
estimates  to  derive  starting  values  for  both  the  logistic  and  probit  regressions,  with 
generally  satisfactory  results.  For  probit  regression,  this  is  motivated  by  the  fact  that 
the  normal  c.d.f.  is  very  well  approximated  by  the  logistic  function  after  the  appropriate 
scale  adjustment.  See  Haggstrom  (1983,  p.  236)  for  a  brief  discussion.  Theorem  1  gives 
further  justification  for  the  use  of  OLS  estimates  to  devise  starting  values  probit 
regression. 
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Bwurlt  2.2.  Goldberger  (1981)  showed  a  result  slmilr  to  theorem  1  for  linear  models  after 


selection  under  the  assumption  that  the  regressor  variables  x  emd  the  error  terms  e  are 
jointly  normal.  (See  also  Maddala  1983,  pp.  168-170.)  Chung  and  Goldberger  (1984)  showed 
a  similar  result  under  the  assumption  that  E(x|y)  Is  linear. 

Bsmark  2.3.  Brllllnger  (1982)  showed  the  result  in  theorem  1  for  the  addltlve-error  model 
in  example  1.4,  under  the  assumption  that  the  regressor  variables  x  are  normally 
distributed.  Brllllnger  also  gave  expression  (1.21)  In  theorem  2  for  the  normal  case.  In 
an  example,  Brllllnger  also  commented  on  the  generalization  of  the  theorem  to  deal  with  the 
Tobit  regression  model  In  exeunple  1.3.  For  consideration  of  experimental  designs,  the 
spherical  symmetry  condition  In  theorem  1  Is  an  important  Improvement  over  the  restriction 
to  multinormal  distributions. 

Bnrnrk  2.4.  It  follows  from  Cacoullous*  (1967)  theorem  1  that  the  only  spherically 
symmetric  distribution  whose  components  are  mutually  Independent  Is  the  multinormal 
distribution.  Otherwise  the  components  are  stochastically  dependent,  even  though  they  are 
uncorrelated.  See,  also,  Kagan  et  al  (1973,  chapter  5).  Therefore  the  only  Intersection 
between  the  spherical  symmetry  condition  ( 1d)  and  the  polynomial  symmetry  condition 
( 1e  -  1g)  Is  the  normal  case. 

Hemark  2.5.  In  the  proof  of  theorem  1,  we  do  not  need  to  consider  the  entire  slope 
vector  B  simultaneously. 

Corollary  4.  let  the  regressor  variables  x  be  partitioned  Into  two  parts,  x^  and  Xjl 
partition  the  slope  vector  6  and  the  01,8  estimate  b  correspondingly  into  Bi,  S2  an<J 
b^,  b2.  We  have 

b^  *  (a.s)  ,  (2.5) 

where  Yi  =  Cov(x,ei,  y)Aar(x,Bi)  ,  (2.6) 

under  assumptions  (la  -  1c)  In  theorem  1,  and  the  following  additional  assumptions. 

(1h)  The  two  subsets  of  regressor  variables  x^  and  X2  are  stochastically 


independent 


(11)  “Rie  subset  of  regressor  variables  are  either  spherically  synmetrlc  (Id)  or 

polynomlally  normal  (le  -  Ig) .  I 

The  multiplicative  scalar  can  be  expressed  alternatively  as 

Y,  «  Ev*(6jj  +  xg,  e)  ,  (2.7) 

or  Y,  -  E  gi(eo  +  xB,  e)  ,  (2.8) 

under  the  assumption  In  theorem  2,  with  x^B^  replacing  xB  in  (2b),  (2d),  and  (2h).  I 

e  ^’^2 

(Proof).  In  the  proof  for  either  theorem,  replace  E  by  E  .  I 

Wsmsrk  2.6.  Corollary  4  allo«rs  us  to  consider  the  slope  parameters  In  subsets.  In  the 
minimal.  If  we  have  at  least  one  symmetrically  distributed  regressor  variable,  corollary  4 
can  be  applied.  (A  symmetric  variable  Is  spherically  symmetrlcl) 

Bemnrk  2.7.  When  assumption  (1h)  Is  not  satisfied,  we  might  still  be  able  to  orthogonallse 
the  two  subsets  of  regressor  variables  and  apply  Corollary  3.  Consider,  for  example,  an 
observational  study  In  which  we  draw  random  samples  from  two  different  populatlonst  let 
be  the  label  for  the  population,  X2  be  the  regressor  variables.  Assume  that  the 
distribution  of  X2  In  the  two  populations  are  different  by  a  location  shift: 

*2  ”  *1“  ^2  ' 

where  Z2  is  independent  of  assume  that  83  satisfies  the  conditions  in  corollary  4. 

If  we  know  the  shift  u,  we  can  reparametrize  the  model  as  follows: 

y  “  9(80  +  *  MB2*  ^*2  “  ’‘,ii)62»  e)  , 

and  apply  corollary  2  to  83*  ^  reality,  we  need  to  estln^ate  u  from  the  sample. 

Remark  2.8.  For  the  addltlve-error  model  In  example  1.4,  we  don't  need  the  error  terms 
to  be  Identically  distributed:  they  don't  even  need  to  be  independent  of  the  regressor 
variables  x^.  All  that  Is  required  is  that  when  we  apply  OLS,  the  term  corresponding  to 


e  converge  to  serot 


rr^Z^ix^  -  x)’e^  *  0  (a.s.)  .  (2.9) 

In  a  sense,  all  GIjLH's  can  be  considered  as  additive  error  models: 

y  “  9(60  xB*  e)  -  v(So  +  xB)  +  e*  , 
where  e*  »  y  -  v(Bo  +  xB) 

The  distribution  of  e*  will  depend  on  x  in  general.  Since  conditions  that  guarantee 
the  convergence  in  (2.9)  for  e*  are  not  easy  to  formulate,  we  will  continue  to  use  the 
GUM  specification. 


3.  PMUMcm  nciMATioii  roft  »  ocMBrnsLY  sncxrzio  glt 

Vfh«n  the  GU*  In  a  GI<LM  la  completely  specified,  vre  can  usually  estimate  the  parameters 
6q  and  Y  left  undetermined  In  theorem  1.  He  can  also  estimate  the  residuals 
Throughout  this  section,  we  will  assume  the  conditions  In  theorem  2  for  expression  (1,21) 
for  the  multiplicative  scalar  y,  l.e.,  we  assume  conditions  (la  -  1c),  (2b),  and 
(2e  -  2f). 

3.1  6LLN 

First  we  consider  the  estimation  for  6q  and  y  In  a  GLLM  with  a  completely 
specified  GLF.  Assume  that  expression  (1.21)  In  theorem  2  Is  valid. 


Y  •  E  g^(8Q  +  xS,  e)  .  (3.1) 

Furthermore,  we  assume 

E  e  -  0  .  (3.2) 

We  have  b  ♦  y*6  ,  (3.3) 

and  yj^  -  g(6o  +  Xj^B,  e^^)  .  (3.4) 


Replacing  the  unknown  parameters  by  the  corresponding  estimates  In  (3.1),  (3.2),  and  (3.4), 
we  have  the  following  system  of  simultaneous  equations i 

c  -  n"’l^  +  c"’*x^b,  ej^)  ,  (3.5) 

0  -  ,  (3.6) 

/i  “  g(b(j  +  c  '-x^b,  ej^)  ,  (3.7) 

where  b  Is  the  OLS  estimate  (1.8),  c  Is  our  estimate  for  Y'  bg  estimates  Bg,  and 
e^  estimates  e^.  He  have  n  2  equations  with  n  2  unknowns,  therefore  the  system 
(3.5)  -  (3.7)  is  solvable,  although  the  solution  set  might  be  nonexistent  or  nonunique. 
Given  that  g  Is  a  nonlinear  function,  the  equations  have  to  be  solved  by  iteration;  given 
the  dimension  of  the  system,  they  are  probdd>ly  expensive  to  solve. 

The  system  of  equations  (3.5)  -  (3.7)  Is  much  easier  to  solve  If  the  given  GLF  Is 
Invertible  In  Its  second  variable.  In  this  case  we  can  solve  for  e^  In  (3.7)  by 

ej^  •  g”'(bg  +  c  ’"Xj^b,  yj^)  ,  (3.8) 

where  we  use  g~^  to  denote  the  Inverse  of  the  function  g(n,e)  for  a  fixed  n* 
Substituting  (3.8)  Into  (3.5)  au>d  (3.6),  we  have 
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(3.9) 


c  «  n  ®  ^ 9  '(^0  °  y^))  • 

and  0  -  g  '(b^  +  c  ^‘x^b,  y^)  .  (3.10) 

We  need  to  solve  for  bQ  and  c  in  the  two-equation  systen  (3.9)  and  (3.10).  If  both  g^ 
and  g~^  are  easy  to  evaluate,  it  is  not  difficult  to  solve  this  system.  Whether  the 
solutions,  if  any,  are  consistent  needs  to  l>e  determined  for  specific  GLUl's. 

lismark  3. 1 .  ■nie  estimated  terms  e^  in  (3.8)  are  the  generalized  residuals  considered  in 
Cox  and  Snell  (1968). 


3.2  General  banaformation  Modela 

For  a  general  transformation  model  (example  1.1)  with  a  completely  specified  GLF,  we 
can  estimate  the  multiplicative  scalar  by  the  sample  mean 

c  -  n"’Ei  g'(g"’(yi))  Y  -  E;g'(y)  ,  (3.11) 

assuming  that  expression  (1.23)  in  theorem  2  is  valid,  and  that  the  given  link  function 
g  is  invertible.  We  can  solve  for  e^^  in  (4.7)  by 

e^  -  g”’(y^)  -  b^j  -  c  •  (3.12) 

It  follows  then  from  (4.6)  that 

bg  “  n”^r^  9”'(yi)  “  c  ^‘Xb  .  (3.13) 


Note  that 


bo  ♦  E  g  \n)  -  uB  “  Bo  (a.s.) 


(3.14) 


Example  3.1.  For  the  Tobit  regression  model  in  example  1.3,  we  have 

Y  •  E  g'(y)  “  P(y  >  0) 

Note  that  for  y  >  0,  g”Ny)  «  yi  for  y  “  0,  i.e.,  Bq  +  xB  +  e  <  0,  g”'(y)  is 

undetermined  but  g'(6g  xS  +  €)  =0. 

We  can  therefore  estimate  y  by 

c  "  n"^  •  #(y^  >  0) 

Note  that  the  dominated  convergence  condition  (2f)  in  theorem  2  is  satisfied  for  the  Tobit 
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3.  3.  Generallaed  U.n«ar  Nodela 


For  th  generalized  linear  model  in  example  1>S,  expression  (1.20)  and  (1.21)  for  t)ie 
multiplicative  scalar  y  are  the  same,  l.e.,  we  need  only  consider  v'  in  the  partial 
derivative  gi: 

c  «  n“'Ej^  *  c”^x^b)  .  (3.15) 

Furthermore,  the  error  terms  can  be  estimated  by 

e|^  «  [yj^  -  '>(bjj  +  c  '•x^b)l/a(bp  +  c  ^'x^b)  ,  (3.16) 

and  equation  (3.6)  reduces  to 

y  *  n  v(bjj  +  c  ^‘x^^b)  .  (3.17) 

ITie  estimates  bg  and  c  can  be  solved  from  (3.15)  and  (3.17). 

Example  3.2.  Consider  a  generalized  linear  model  with  an  exponential  lln)c  function 
v(n)  exp(n)*  It  follows  from  (3.15)  and  (3.17)  that 

c  -  y  .  (3.18) 

Note  that  c  converges  almost  surely  to  E(y)  «  E  v(6(j  +  xB)  “  E  '’’(Bq  +  x8 )  *  y  • 
Substituting  (3.18)  in  (3.15),  we  have 

exp(bg)  -  y/(n  'e^  exp(x^b/y))  .  (3.19) 

The  denominator  in  (3.19)  converges  almost  surely  to  E  exp(xB),  therefore  exp(bQ) 
converges  almost  surely  to  exp(Bo)* 

3.4.  General  Scaled  nranaforaation  Models 

For  a  general  scaled  transformation  model  (example  1.6)  with  a  completely  specified 
GLF,  we  can  estimate  y  and  Bg  by  solving  the  following  two  equation  system: 

c  =  n"’Ej^  g'(q"^(yj^))«vMb|j  +  c  ’‘X^b)  ,  (3.20) 

and  n'^Ej^  *  n'^Ej^  '’^^g  c”'xj^b)  ,  (3.21) 

assuming  that  the  transformation  function  is  invertible.  The  error  terms  can  be  estimated 
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4.  SNEARIMG  ESTmATIOH  FOR  THE  RBSFOHSE  SURFACE 


In  section  1.3  we  discussed  the  estimation  of  the  response  surface  u(x)  =  E(y|x)  for 
a  completely  unspecified  GLLM  and  for  partially  specified  GLLM's,  using  nonparametric 
regression  of  y^  on  x^^b.  For  a  GLLM  with  completely  specified  GLF,  it  is  still  possible 
to  use  nonparametric  regression  to  estimate  the  response  surface;  however,  we  should  be 
able  to  do  better  by  making  use  of  the  information  available  on  the  GLF.  In  this  section 
we  consider  the  smearing  estimate  (1.12)  in  section  1.3  and  derive  some  of  it  properties. 

4.1.  ^tearing  Estimate  for  GUM 

For  a  given  GLLM  of  a  general  form,  the  smearing  estimate  is  given  by  (1.12),  with 
estimates  bg,  c,  and  e^^  given  in  Section  3.1.  In  section  4.2  we  give  a  consistency 
result  for  this  estimate. 

For  a  given  general  transformation  model,  the  smearing  estimate  is  given  by 

s(x)  =  n  g(c  ^"xb  +  g  ^(y^^)  “  c”^»x^b)  ,  (4.1) 

with  c  given  by  (3.11).  We  give  a  consistency  result  for  this  estimate  in  section  4.3. 

For  a  given  generalized  linear  model,  there  is  no  need  to  "smearing"  over  the 
residuals  to  estimate  the  response  surface;  we  can  simply  use 

s(x)  =  +  c  '•xb)  ,  (4.2) 

with  the  estimates  bg  and  c  given  in  section  3. 3.  If  the  estimates  bg  and  c  are 
consistent,  (e.g.,  if  the  link  function  v  is  exponential  as  in  ex2unple  3.2),  and  v  is 
continuous  at  8g  +  x6,  the  estimated  response  surface  is  consistent. 

The  smearing  estimate  for  the  general  scaled  transformation  model  is 

s(x)  =  n”'j;j^  g(v  +  o*e^)  ,  (4.3) 

with  u  =  u(bg  +  c'^xb),  a  =  o(bg  +  c”^xb);  bg,  c,  and  are  given  in  section  3.4. 

4.2.  bearing  Estimate  for  GUM 

If  bg,  b,  c,  and  e^^  are  good  estimates  for  the  corresponding  unknown  quantities, 
we  have 
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ll>»Og—  4.  Ihe  OLS  estimate  b  in  (1.7)  is  consistent  for  Y*8  of  order  n^2  for  the 


general  transformation  model : 


nV2(b  -  y*8)  -  0p(1)  , 


under  assumptions  (1a  -  c)  in  theorem  1  and  the  following  assumption: 

2 

(4a)  The  expectation  E  y  ‘x'x  exists.  I 

(The  proof  is  given  in  Appendix  C.) 

Corollary  5.  The  smearing  estimate  (S.1)  for  the  general  transformation  family  is  wea)cly 
consistent  under  the  assumptions  in  theorem  1,  assumption  (4a)  in  theorem  4,  and  the 
following  assumptions: 

(4e)  The  GLF  g  is  continuously  differentiable. 

( 4f )  The  following  expectation  exist  for  all  H  >  0 : 

E{sup(lg' (8Q+Xjj8+£tt)|^>  lt|  <  M]}  .  I  (4.9) 

(Proof)  It  was  noted  in  section  3.2  that  the  estimates  bQ  and  c  exist  and  are  strongly 
consistent.  The  estimates  exist  and  the  squared  sum  in  (4.5)  can  be  expressed  as 


follows : 


^i**i"®i^^  “  ne  ^  ((x^-x)  (b-c8  >) 


(4. 10) 


The  first  term  in  (4.10)  is  asymptotically  bounded  by  the  central  limit  theorem.  The 
denominator  in  the  second  term  converges  to  Y  •  numerator  in  the  second  term  can  be 


written  as 


n"  ^'^2  (b-cB)'«n"'E^(x^-x)'(x^-x).n“  ^^2  (b-c8) 


(4.11) 


The  second  term  in  (4.11)  converges  almost  surely  to  jl.  The  third  term  can  be  decomposed 


n"  ^2  (b  -  y8) 


-  n"  ^'^2  ( c  -  Y  )  •  8 


(4. 12) 


The  first  term  is  asymptotically  bounded  by  theorem  4.  The  second  term  is  asymptotically 
bounded  by  the  central  limit  theorem  applied  to  (3.11).  Therefore  (4.11)  is  Op(1),  thus 
(3b)  is  satisfied.  The  moment  conditions  (3d)  reduces  to  (4f)  for  the  general 
transformation  family.  II 


Beiaarlc  4. 1 .  Ihe  moment  condition  (4f)  is  the  same  as  the  moment  condition  in  Duan  (1983) 
which  considers  the  smearing  estimate  when  the  OLS  is  applied  on  the  transformed  scale 
g~^(y^)>  IXian  (1983)  noted  that  if  |g'|  is  monotonic,  ««e  can  replace  the  moment 
condition  (4f)  by 

E(g'(c+e)j2  <  »  for  all  c  .  (4.13) 

If  the  GLF  la  the  power  function  g(n)  ■  n**,  the  moment  condition  (4.9)  reduces  to 

K(c+€)^*'*“''>  <  •  for  all  c  , 

which  is  satisfied  if  q  >  1  when  the  error  term  e  follows  a  normal  distribution.  Note 
that  q  >  1  implies  that  the  linearizing  transformation  is  the  power  transformation 

g-1(y)  -  yVq  , 

where  the  power  pareuneter  1/q  falls  between  zero  and  one. 

If  the  GLF  is  exponential,  g(T))  ~  exp(n),  !>#•,  the  linearizing  transformation 
g~^  is  the  logaritlunlc  transformation,  the  moment  condition  (4.9)  reduces  to 
E  exp(2e)  <  <",  which  is  satisfied  for  the  normal  error  distribution. 

4.4.  teearlng  Bstlmatm  for  BalrntwlM  Slope  ^ctor 

Hie  polntwise  slope  vector  glvcm  by  (1.15)  can  he  estimated  by  differentiating  the 
smearing  estimate  (1.12)< 

Vs(x)  -  c”'b«n~^r^  g^(bu+c”'»xb,e^)  .  (4.14) 

The  second  factor  in  (4.14)  estimates  the  derivative  v’(xB)  in  (1.15). 

Corollary  6.  Hie  smearing  estimate  (4.14)  for  the  polntwise  slope  vector  in  a  GLLM  with  a 
conqpletely  specified  GLF  is  wea)cly  consistent  under  the  assumptions  in  theorem  1  and  ( 3a )  - 
(3d)  in  theorem  3,  with  g  replaced  by  g^  in  (3o),  and  g^,  <32  replaced  by  g^^,  g^2 
in  (4.6)  and  (4.7).  I 

Corollary  7.  For  a  general  transformation  model  with  a  completely  specified  GLF,  the 


smearing  estimate 


is  weakly  consistent  for  the  polntwlse  slope  (1.15)  vinder  the  assumptions  in  theorem  1 


assumption  (4a)  in  theorem  4,  and  assumptions  (4e  -  f)  in  corollary  5,  with  g  replaced 
by  g'  in  (4e)  and  g'  replaced  by  g"  in  (4f).  I 

Hemeirlc  4.2.  For  the  generalized  linear  model,  there  is  no  need  to  "smear"  over  the 
residuals  to  estimate  the  polntwlse  slopes;  we  can  simply  differentiate  (4.2): 

Vs(x)  =  c”^b»v*(bu  +  c”^xb)  ,  (4.1 

which  is  consistent  if  bg  and  c  are  consistent. 
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APFumiz  At  FROor  or  mosm  i  onsR  smiucAL  snnTRT 

We  need  to  show  that 

E*®(x-M)e  -  0  (A.1) 

where  8 '19  ”  0.  let  6*  •  (B'EB)~  B*  For  p  >  2,  we  can  find  B2'***'^p  “uch 
that  B  -  [6,B*>B2' *  *  * '^p^  iinltary:  B'ZB  ••  I. 

Let  $  ~  (x-)i)B.  Since  x  is  spherically  synunetrlcal  centered  at  Vt  ^  has  the  sasia 
distribution  as  (x  -  u)Jl  .  (Let  5*  “  “  (x-ii)BS^^;  note  that 

Vt 

(BZ  ^  ) '£(B£'^  )  -  £,  therefore  Z*  has  the  same  distribution  as  x-)i.) 

Claim  C  Is  spherically  symmetric  with  respect  to  the  usual  inner  product  [v,w]  »  v'w, 
l«e.,  has  the  same  distribution  as  C  if  A'A  «  I. 

(Proof)  Note  that  A'BEBA  -  I,  thus  BA  is  unitary  with  respect  to  the  inner  product 

-  V? 

<•,•>•  nierefore  ^A  «  (x-)i)BA  has  the  seuse  distribution  as  x  -  ii)£  I 


Consider  the  diagonal  matrix  A  with  A^^  ~  -1  and  all  other  diagonal  elements  being 
one.  It  follows  from  the  claim  that 

CA  -  <~ei,c2 . 5p> 

has  the  same  distribution  as  C*  Therefore 

®( “5 1 1 C 2' * • * '^p^  "  E(C 1 1 5 2' • • • » ®  *  (A. 2) 

Integrating  (A. 2)  with  respect  to  C3>**’>Cp>  we  have 

E*®(x-g)8  -  E(5,|t2)  =  0  . 

This  proves  (A.1)  and  completes  the  proof  of  theorem  1.  I 
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APvnDix  B<  PROor  or  taioiaM  4 

Let  4  “  (Lq-Bq)  +  x(c”^b-B),  fi  “  ®i  “  Talcing  the  first  order  Taylor's  expansion 

for 

-  g(bQ+c~'xb,  e^^)  -  g(Bo+xB,  e^) 
in  the  direction  we  have 

^  -  «-9li  ^*92i  ' 

where  0  <  6^^  <  1,  and  gj^  «  9j(6o  +  xB  +  Bj^B,  +  BiT^)#  j  “  1»2. 

We  need  to  show  that 

6*n“’lj^g^^  +  0  (p)  ,  (B.1) 

and  ""'^i^i'^Ji  °  (p)  •  (B.2) 

By  assumption  (4a)  and  theorem  1,  the  first  factor  in  (B.1),  5,  converges  to  zero  in 

probability.  By  Cauchy-Schwarz  inequality,  the  square  of  (B.2)  can  be  bounded  by 

n~’£^T^*n“’E^(g2j^)^  .  (B.3) 

By  assumption  (4b),  the  first  factor  in  (B.3)  converges  to  zero  in  probability.  It  remains 
to  show  that  the  second  factors  in  (B.1)  and  (B.3)  are  bounded  asymptotically. 

Since  S  *  0  (p),  for  any  a  >  0,  we  can  find  n  large  enough  so  that  |4)  <  a 
with  probability  arbitrarily  close  to  one.  By  assumption  (4b),  we  can  choose  M  large 
enough  so  that  for  n  large  enough,  the  inequality 

^i<*i  ■  ®i’^  ^ 

holds  with  probability  arbitrarily  close  to  one.  In  particular,  we  then  have 

i''ll  “  I®!  ”  <  M,  i  -  1,,..,n 

The  second  factor  in  (B.1)  is  then  bounded  from  above  by 

n“’l^{aup(|g^(B(j+xB+t,  e^+s)  | »  |s|  <  M,  |t|  <  a}  , 
which  converges  to  the  expectation  in  (4.6)  by  the  strong  law  of  large  numbers.  Likewise, 
the  second  factor  in  (B.3)  is  bounded  from  above  by  an  i.l.d.  average  which  converges  to 
the  expectation  in  ( 4 . 7 ) .  I 
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APFBHDIX  C:  PROOT  CT  nBOMM  S 


Let  s  *  X'QX/n,  c  =  X'QY/n,  8  «  ^xy**  need  to  show  that 


It  suffices  to  show  that 


The  left  hand  side  of  (C. 1)  is  equivalent  to 


By  central  limit  theorem,  the  first  term  In  (C.3)  converges  to  a  multinormal  distribution 
under  assumption  (6a).  The  first  factor  In  the  second  term  converges  to  a  multinormal 
distribution)  the  second  factor  converges  to  E(y).  Therefore  (C.1)  Is  satisfied. 


n^2  [a-’c  -  E-’ej  -  0^(1)  , 


n‘^2  (c  -  0)  -  O  (1)  , 

P 

(s"^  -£“’)=  O  (1)  . 

P 


(C.1) 


(C.2) 


n“  Ej^[(x^-x)y^  -  E(x^-w)y] 

*  I/5 

=  n  E^[(Xj^-U)yj^  -  E(x-ij)y]  +  n'^(x-p)»y 


(C.3) 
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